Recently, twitch literature has begun characterizing twitch communities through twitch chat, viewership trends, and content but no known projects have used resources that exist outside of twitch to understand how twitch communities manifest and interact with one another.
One subreddit called LivestreamFail (LSF) is a dedicated subreddit where users share these twitch clips, general twitch news, and twitch drama.LivestreamFail is one way smaller streamers become noticed and is a platform that I can use to compare big and small communities. I’m interested in the ways emotes are used between smaller and larger communities because I believe emote meanings and sentiments are being actively redefined. This analysis will be split into an LSF part and Twitch emote-sentiment part.
This analysis will investigate users, posts, and comments (sentiment and topic) on r/LivestreamFail and then investigate the comment data and emote use from twitch clips that were featured in LSF posts.
library(pacman)
p_load(tidyverse,
tidytext,
tm,
lubridate,
stringr,
text2vec,
jsonlite,
widyr,
quanteda,
visNetwork,
igraph,
ggraph,
DT,
ggthemes)
LSF STUFF SHOULD GO HERE.
tl;dr - twich dmca issues, streamer bans, and the app I used prevented me from downloading alot more data.
Reddit data was gathered using python and PRAW (Python Reddit API Wrapper) to gather recent data from r/LivestreamFail (October 2020). This resulted in over 900 reddit posts. This data was then used in R scrape links and document if the clips had chat available for download.
To actually download the twitch chat, I used application by lay295 and zigagrcar on github found here.
The Digital Millennium Copyright Act is affecting twitch in a big way.
Twitch is currenly in hot water with DMCA claims, and they are banning streamers for repeated streaming “copyrighted” songs. One method streamers use to combat this is by deleting their content shortly after it was broadcasted. This affected data collection since the collected twitch clips were being actively taken down.
This led to the collection of twitch chat from 227 links present in from the reddit posts.
R and Rselenium was used to scrape the emote data from FrankerFaceZ and BettertwitchTV. Roughly the Top 300 emotes used from each site was collected (emote name and link to image).
what chat looks like
bttv emotes need to be updated, also not sure if gif emotes work.
# https://i.stack.imgur.com/kLMaS.jpg
test<-emote_data %>% mutate("emote_image" = paste("<img src=", emote_link, sep = "")) %>%
mutate(emote_image = paste0(emote_image,' height="52"></img>',sep = "")) %>% select(emote_name,emote_image)
datatable(test, escape = FALSE)
This chart shows us how many unique chat lines there are per streamer. This metric is useful for understanding which streamers may be getting the most attention during a point in time on LSF. Thought this metric should later be controlled for clip length, since longer clips offer more opprotunity for chat engagement.
data %>% group_by(streamer) %>% count(sort = T)%>%
head(n=10) %>%
ggplot(aes(x = reorder(streamer,-n), y = n))+
geom_col()+
theme_wsj(base_size = 12, color = "green")+
theme(axis.text.x = element_text(size = 12, angle = 15,vjust = .55))+
labs(title = "Which streamer has the most chats?")
This visualization show us the most active twitch chatters in our dataset. In a larger dataset, finding those high-interactcion chatters maybe useful for drawing links between communities or even creating a contributer badges on twith (like the founders badge).
# This creates a !%in% kind of deal
`%notin%` <- Negate(`%in%`)
data %>% group_by(user) %>% filter(user %notin% c("StreamElements","Streamlabs","Nightbot")) %>% count(sort = T) %>%
head(n=10)%>%
ggplot(aes(x = reorder(user,-n), y = n))+
geom_col()+
theme_wsj(base_size = 12, color = "green")+
theme(axis.text.x = element_text(size = 8, angle = 15,vjust = .55),
plot.title = element_text(size = 20))+
labs(title = "Which user has the most chats?")
# Streamelements and streamlabs are bots.
REVISIT THIS ONE
data %>% group_by(streamer,user) %>% count(sort = T) %>% head(n = 10)
## # A tibble: 10 x 3
## # Groups: streamer, user [10]
## streamer user n
## <chr> <chr> <int>
## 1 ZeratoR Crackmort 74
## 2 EsfandTV Eltefan 54
## 3 Trainwreckstv Homie_from_compton 46
## 4 Mizkif Gekon 42
## 5 EsfandTV WaterLaws 39
## 6 Trainwreckstv LeeqoX 36
## 7 EsfandTV newmanji 32
## 8 EsfandTV Nevarixxx 31
## 9 EsfandTV IcePal 30
## 10 Trainwreckstv juniorrr 30
This plot will give us further insight in the the demographics of the communities of top 5 streamers. This shows the number accounts created by year for each member of the chat by streamer. As an example, one conclusion that may be drawn is that streamers forsen and Mizkif are not attracting new accounts (New user/ban evaders) to their channels. Another conclusion that may be drawn is that Trainwreckstv in 2018, attracted alot of new users, and perhaps played a significant role in bringing new users to twitch. I should investigate futher to understand what happened with train in 2018. This was perhaps his drama year with MitchJones (A popular WOW streamer) or The Speech.
top_5_streamers <- data %>% group_by(streamer) %>% count(sort = T) %>% head(n=5) %>% distinct(streamer)
data %>% filter(streamer %in% top_5_streamers$streamer) %>% mutate(date_year = year(as.Date.character(date))) %>% group_by(date_year,streamer) %>% count(sort = T)%>%
ggplot(aes(x = date_year, y = n, color = streamer)) +
geom_line(size = 2)+
theme_wsj(base_size = 12, color = "green")+
labs(title = "Streamer Communities: Account Creation Dates", subtitle = "Top 5 Streamers")+
theme(plot.title = element_text(size = 15),plot.subtitle = element_text(size= 8),legend.title = element_blank(),legend.position = "bottom")
Tokens, bigrams and trigrams can give us insign into popular emotes/words and spams that occur in these chats.
tokens <- data %>%
unnest_tokens(word,body)%>%
filter(str_detect(word,"^[:alpha:]"))
tokens %>% glimpse(width = 50)
## Rows: 159,517
## Columns: 4
## $ user <chr> "Humorous_Chimp", "mayodongs...
## $ date <dttm> 2013-03-29 16:35:36, 2019-1...
## $ streamer <chr> "Jerma985", "Jerma985", "Jer...
## $ word <chr> "omegalul", "out", "of", "mo...
tokens %>% group_by(word) %>% count(sort = T)%>%
head(n=10) %>%
ggplot(aes(x= reorder(word,-n),y=n))+
geom_col()+
theme_wsj(base_size = 12, color = "green")+
theme(axis.text.x = element_text(angle = 25))+
labs(title ="Token Counts")
data %>%
unnest_tokens(bigram,body,token = 'ngrams',n = 2)%>%
filter(str_detect(bigram,"^[:alpha:]")) %>%
group_by(bigram) %>% count(sort = T)%>%
head(n=10) %>%
ggplot(aes(x= reorder(bigram,-n),y=n))+
geom_col()+
theme_wsj(base_size = 12, color = "green")+
theme(axis.text.x = element_text(angle = 25,size = 9))+
labs(title ="Bigram Counts")
data %>%
unnest_tokens(trigram,body,token = 'ngrams',n = 3)%>%
filter(str_detect(trigram,"^[:alpha:]")) %>%
group_by(trigram) %>% count(sort = T)%>%
head(n=10) %>%
ggplot(aes(x= reorder(trigram,-n),y=n))+
geom_col()+
theme_wsj(base_size = 12, color = "green")+
theme(axis.text.x = element_text(angle = 25,size = 9))+
labs(title ="trigram Counts")